Cellular APOBEC3A deaminase drives mutations in the SARS-CoV-2 genome

Abstract The number of genetic variations in the SARS-CoV-2 genome has been increasing primarily due to continuous viral mutations. Here, we report that the human APOBEC3A (A3A) cytidine deaminase plays a critical role in the induction of C-to-U substitutions in the SARS-CoV-2 genome. Bioinformatic analysis of the chronological genetic changes in a sequence database indicated that the largest UC-to-UU mutation signature, consistent with APOBEC-recognized nucleotide motifs, was predominant in single-stranded RNA regions of the viral genome. In SARS-CoV-2-infected cells, exogenous expression of A3A but not expression of other APOBEC proteins induced UC-to-UU mutations in viral RNA (vRNA). Additionally, the mutated C bases were often located at the tips in bulge or loop regions in the vRNA secondary structure. Interestingly, A3A mRNA expression was drastically increased by interferons (IFNs) and tumour necrosis factor-α (TNF-α) in epithelial cells derived from the respiratory system, a site of efficient SARS-CoV-2 replication. Moreover, the UC-to-UU mutation rate was increased in SARS-CoV-2 produced from lung epithelial cells treated with IFN-ß and TNF-α, but not from CRISPR/Cas9-based A3A knockout cells. Collectively, these findings demonstrate that A3A is a primary host factor that drives mutations in the SARS-CoV-2 RNA genome via RNA editing.


INTRODUCTION
Coronavirus disease 2019 (COVID- 19), which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread rapidly worldwide since December 2019 (1,2). Concomitantly, genomic variation in SARS-CoV-2 has increased, with different variants of SARS-CoV-2 lineages emerging in multiple geographical regions of the world. Certain SARS-CoV-2 lineages have been designated variants of concern (VOCs) because of their increased transmissibility, related disease severity and immune escape properties. These genetic changes, characteristically comprising insertions, deletions and substitutions (3), have been reported in various genomic surveillance databases, such as the Global Initiative on Sharing All Influenza Data (GI-SAID) EpiCoV (referred to as the 'GISAID' in this report), and are regarded as outcomes of viral selection for propagation in hosts. Most genetic changes in SARS-CoV-2 that negatively affect viral replication are likely unsustained, whereas a small proportion that do not affect or benefit viral infectivity, immune escape, or tolerance to antiviral treatments have been observed in the database sequences (4).
The primary sources of substitutions in the SARS-CoV-2 genome arise from viral replication errors during error-prone RNA-dependent polymerization, or ribonucleotide misincorporation. These events occur despite the reduction in the error magnitude by the viral proofreading exoribonuclease (ExoN) in the N-terminal domain of nonstructural protein 14 (nsp14), which is commonly encoded in the genomes of Coronaviridae but not in those of other RNA viruses (reviewed in ref. (5)). Additionally, an early study based on public RNA transcriptome datasets containing data on bronchoalveolar lavage fluid from COVID-19 patients suggested that host deaminases may play roles in the increased genetic variation in the SARS-CoV-2 RNA genome (6). In this study, two potential host deaminases, APOBEC polynucleotide cytidine (C) deaminases and adenosine (A) deaminase RNA specific (ADAR), were assessed for their potential effects on two types of dinucleotide mutation signatures, UC-to-UU (U, uridine) and AA-to-AG (G, guanosine) substitutions, respectively. Furthermore, sequence analysis of datasets available from GI-SAID suggested that the nucleotide changes in the SARS-CoV-2 genome during the initial months of the 2020 pandemic primarily involved C-to-U mutations, likely driven by a host APOBEC-like editing process (7,8). However, it remains unclear whether cellular APOBEC cytidine deaminases directly play a critical role in driving SARS-CoV-2 genome mutations.
In this study, we performed bioinformatic analysis of SARS-CoV-2 genome sequences in GISAID to re-evaluate whether the APOBEC-like signature of UC-to-UU muta-tions is still continuously generated in the genome. In addition, we investigated the effects of the cellular expression of APOBEC cytidine deaminases on the induction of UC-to-UU mutations in the viral RNA (vRNA) genome using live viruses. We found that endogenous and exogenous expression of A3A in cells induced UC-to-UU mutations in the SARS-CoV-2 genome. The mutation hotspots mediated by A3A were often located at the tips in the loop regions in the viral genome. Our findings in this study suggest that A3A is a major cellular factor that increases genetic variations in the SRAS-CoV-2 RNA genome.
Primary monocyte-derived macrophages (MDMs) were prepared from human peripheral blood mononuclear cells (PBMCs) of healthy donors as described previously (24). In brief, monocytes were first isolated from PBMCs using a Classical Monocyte Isolation Kit (Miltenyi Biotec). CD14 + cells were plated at 5 × 10 5 cells/well in RPMI 1640 medium (Sigma-Aldrich) supplemented with penicillin (100 U/ml) and streptomycin (100 g/ml) for 3 h prior to the addition of 10% FBS and 10 ng/ml macrophage colony stimulating factor (PeproTech). Adherent cells were cultured for 7 dys and used as MDMs.
The virus strain used in this study was SARS-CoV-2 B.1.1 (Pango Lineage B.1.1, GISAID EPI ISL 568558), which was isolated from a nasal swab sample of a patient in the Nagoya Medical Center, Japan, and then propagated in Vero E6 cells (25). Virus propagated in Vero E6 cells was stored at -80 • C for use in subsequent experiments. To determine the virus titre, Vero E6/TMPRSS2 cells cultured in 96-well plates were incubated with 100 l of serially diluted (twofold) virus in DMEM GM for 1 h at 37 • C. The infected cells were then washed once with 100 l of prewarmed fresh DMEM GM and incubated in fresh DMEM GM for 2 days at 37 • C. The cytopathic effect was evaluated by microscopy. The median tissue culture infectious dose (TCID 50 ) was determined by the Reed and Muench method (26). For analysis of vRNA, supernatants were first clarified by centrifugation at 750 × g for 10 min and were then filtered through a 0.45-m-pore membrane (Merck Millipore).

Plasmids and transfection
The expression plasmids for human AID with a C-terminal Myc-HIS tag were constructed in the pcDNA 3.1 (-) vector (Thermo Fisher Scientific). The expression plasmids for human A3A with a C-terminal Myc-HIS tag in the pcDNA 3.1 (-) vector and of A1 containing a C-terminal HA tag in the pCASSG vector, were obtained from Dr Klaus Strebel and Dr Terumasa Ikeda, respectively (27,28). The plasmids carrying A3B, APOBEC3C (A3C), APOBEC3D (A3D), APOBEC3F (A3F), and A3G in the pcDNA 3.1 (-) vector and the plasmid carrying APOBEC3H (A3H) (haplotype II) in the pTR600 vector were prepared as previously described (29)(30)(31). Notably, there was a technical difficulty in the preparation of the A3A and A3B expression plasmids that was caused partially by their high toxicity. In addition, the A3A and A3B fragments were prone to acquiring detrimental mutations during plasmid propagation in Escherichia coli. Therefore, we screened several E. coli strains that ultimately enabled us to prepare the A3A and A3B expression plasmids. Only the NEB Turbo strain (New England Biolabs) was used, and no obvious toxicity or detrimental mutations were observed during propagation.
For expression of the A1, A3 and AID proteins, each plasmid (0.5 g) was transfected into 293AT cells in 12well plates using FuGENE HD (Promega) according to the manufacturer's instructions. The cells were used for viral infection experiments thirty-six hours after transfection, and for immunoblot analysis of protein expression 48 h after transfection.

Deep sequencing and analysis of the viral genome
vRNA was extracted from 140 l of culture supernatant using the QIAamp Viral RNA Mini Kit (QIAGEN) according to the manufacturer's protocol. Each of the isolated vRNAs was dissolved in 60 l of RNase-free distilled water and stored at -80 • C until sequencing analysis was performed. The next-generation sequencing (NGS) libraries (nt 55-29835, relative to the Wuhan reference sequence, hCoV-19/Wuhan/WIV04/2019, EPI ISL 402124) were prepared with a QIAseq SARS-CoV-2 Primer Panel (QIAGEN) and a QIAseq FX DNA Library Kit (QIA-GEN), according to the manufacturer's instructions. In brief, RNA samples were converted by reverse transcription into cDNA and enriched through two multiplex targeted PCRs with two different primer pools (pool1 and pool2). The enriched products in the two pool reactions were mixed and purified with an equal volume of AMPureXP beads (Beckman Coulter). The product concentrations were then quantified using a Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific) with a Qubit 2.0 fluorometer (Thermo Fisher Scientific). The purified products (250 ng) were subjected to fragmentation, end repair, and adaptor ligation for library preparation. After the series of reactions were completed, the libraries (average of ∼500 bp) were subjected to two rounds of size selection and purification (at 0.8 × and 1 × volumes, respectively) with AM-PureXP beads (Beckman Coulter). The equimolar libraries were pooled (12 samples/pool), diluted to a final concentration of 10-13 pM, and then sequenced on the Illumina MiSeq platform using a MiSeq Reagent Kit v3 (300 cycles) (Illumina).
To identify minority mutations, the read data obtained through MiSeq were processed using our previously reported method (32) with slight modifications. In brief, the primer sequences were trimmed from the raw read sequences with the cutPrimers program (ver. 2.0) (33). The trimmed sequences were mapped to the reference sequence or a corresponding consensus sequence in the viral stock using the BWA-MEM algorithm in the BWA program (ver. 0.7.3a-r367) (34). Next, sequences with a mapping quality ≥ 60 were selected with the SAMtools program (ver. 0.1.18-r580) (35). To distinguish minority mutations from sequencing errors, a threshold was set at 1% relative abundance as the minority population, and error correction was carried out by filtering based on the per-site quality scores for each base (32). For this study, we examined minority mutations present in ≥ 1% of the population based on at least two read sequences. For experiments with the same virus isolates, we omitted shared mutations commonly observed in the isolates from the analyses. For bar graph showing the prevalence with C-to-U mutations throughout the viral genome, we grouped the mutational patterns in the dinu-cleotide context into three: UC-to-UU, VC-to-VU (V = not U), and others.

IFN treatment under normoxic and hypoxic conditions
To measure A3A mRNA levels, the indicated cells were seeded in 12-well plates. After incubation for 24 h at 37 • C, the cells were treated with human IFN beta 1a (IFN-ß) (1,000 U/ml) (PBL Assay Science) and/or recombinant human TNF-␣ (50 ng/ml) (Fujifilm Wako Chemicals) in DMEM GM under 5% CO 2 for 18 h at 37 • C (under normoxic conditions). For hypoxia treatment, immediately after adding IFN-ß and/or TNF-␣, the cells were placed in a 1% O 2 environment using a BIONIX-1 Hypoxic Culture Kit (Sugiyamagen) as previously reported (36). After incubation under hypoxic conditions for 18 h at 37 • C, total mRNA was extracted from the cells.
For SARS-CoV-2 infection, Calu-3 cells (2 × 10 5 cells/well) in 12-well plates were treated with or without IFN-ß (1,000 U/ml) and TNF-␣ (50 ng/ml) in DMEM/F-12 GM under 5% CO 2 for 18 h at 37 • C. After the cells were washed twice with 800 l of fresh prewarmed DMEM/F-12 GM, they were incubated in fresh DMEM/F-12 GM for another 8 h and were then infected with SARS-CoV-2 B.1.1 at a high multiplicity of infection (MOI) of 0.5 for 4 h. The cells were then washed twice with DMEM/F-12 GM. The cells pretreated with IFN-ß and TNF-␣ (pretreatment condition, referred to as Pretreated) were further incubated in DMEM/F-12 GM for 72 h, and the culture supernatants were harvested for virus passaging and deep sequencing. The cells not pretreated with IFN-ß and TNF-␣ treatment before infection were incubated in the presence (posttreatment condition, referred to as Posttreated) or absence of IFN-ß (1000 U/ml) and TNF-␣ (50 ng/ml) (control) for 48 h before the cell culture supernatants were harvested. Each volume of harvested virus was passaged twice in the same manner (to passage 3, P3). Notably, SARS-CoV-2 production was delayed under the pretreatment condition compared with the posttreatment and control conditions, partially because SARS-CoV-2 is highly sensitive to IFN pretreatment (37)(38)(39). Therefore, virus-infected cells cultured under the pretreatment condition were incubated 24 h longer than those cultured under the posttreatment and control conditions.

Quantitative reverse transcription-droplet digital PCR (RT-ddPCR) analysis
Total cellular RNA was isolated from cells cultured in 12well plates by using ISOGEN-LS (Nippon Gene) according to the manufacturer's protocol. For quantification of the A3A, CD11b, CD68 and housekeeping ribonuclease P protein subunit p40 (RPP40) mRNA levels, RT-ddPCR was performed using a One-Step RT-ddPCR Advanced Kit for Probes (Bio-Rad Laboratories). In brief, a total volume of 22 l per reaction was first prepared by mixing Supermix (5.5 l), nuclease-free distilled water (1.1 l), reverse transcriptase (2.2 l), 300 mM dithiothreitol (1.1 l), 55 ng of total RNA (5.5 l), and either the A3A or RPP40 TaqMan gene expression probe-primer mixture (1.1 l of the 20 × mixture). FAM-labelled A3A mRNA-specific (Hs00377444 m1), FAM-labelled CD11b mRNAspecific (Hs00355885 m1), FAM-labelled CD68 mRNAspecific (Hs00154355 m1) and VIC-labelled housekeeping RPP40 mRNA-specific (Hs01017007 m1) probe/primer mixtures (Thermo Fisher Scientific) were used. For each reaction solution, droplets were generated by loading 20 l of the reaction mixture into a QX200 Droplet Digital PCR System (Bio-Rad Laboratories), and PCR amplification was performed with a ProFlex PCR system (Thermo Fisher Scientific) under the following thermal cycling conditions: reverse transcription for 60 min at 42 • C, enzyme activation for 10 min at 95 • C, 40 cycles of denaturation for 0.5 min at 95 • C and a subsequent extension for 1 min at 60 • C, and enzyme deactivation for 10 min at 98 • C. After the PCRs, the reaction droplets were analysed with a QX200 Droplet Digital PCR system and QuantaSoft software (ver. 1.7) (Bio-Rad Laboratories). The ratios of the A3A, CD11b and CD68 mRNA copy numbers relative to that of RPP40 in equal amounts of total RNA were calculated.

Database analysis of viral genome sequences and RNA secondary structures
The full genome (>29 000 nt) sequences (approximately 12.1 million sequences) downloaded on 24 July 2022, from the GISAID EpiCoV database (https://www.gisaid.org/) were used in this study. First, genome sequences with ambiguous bases or without a sampling date were excluded from the downloaded sequence file. Next, to attain a level of confidence for sequences targeting transmittable viruses, we extracted all the sequences in which the viral proteincoding region (nt 266-29 674 relative to the Wuhan reference sequence) was matched with one or more sequences. That is, genome sequences that did not have identical genotyped sequences in the database were excluded. In total, 2 051 393 sequence datasets were used for subsequent analyses (provided upon request). The metadata corresponding to the individual sequences, such as viral lineage information, were also obtained from the GISAID EpiCoV.
Nucleotide substitutions within individual genome sequences were identified as follows: First, sequences were mapped to the Wuhan reference sequence using the min-imap2 program (ver. 2.17-r974) with the following options:a -A 2 -O 24, 24 -E 2,2. Alignment data in CIGAR strings in the output SAM file were converted into a simple sequence alignment format as a FASTA file with an in-house program (provided upon request). Substitutions were counted based on the alignment information obtained with in-house alignment programs (provided upon request). Mononucleotide substitutions were simply counted by comparison with the reference sequence. To analyse substitutions at di-and trinucleotide motifs, substitutions were scored by considering the mono-or dinucleotide sequences immediately upstream of the substituted positions. In addition, to determine whether each substitution was synonymous or nonsynonymous, we examined whether the individual nucleotide substitution resulted in an amino acid change, considering the in-frame trinucleotide sequences flanking the substituted position. We referred to the RNA secondary structure of the SARS-CoV-2 genome, which was previously determined based on in vivo SHAPE data (40), and Nucleic Acids Research, 2023, Vol. 51, No. 2 787 visualized it with the online application mFold web server (41) for verification.

Mutational signatures in the SARS-CoV-2 sequence database
As an initial step towards identifying the mutator(s) involved in the vRNA mutations, we analysed chronological changes in the SARS-CoV-2 genome datasets (∼12.1 million sequences) reported in GISAID throughout the period of ∼2.5 years from December 2019 through June 2022. To increase the accuracy of the analysis, the genome sequences (2 051 393 sequences) were first extracted according to the following three criteria: (i) ambiguous bases were not included in the sequence; (ii) the sampling date was indicated and (iii) more than two matched sequences in the entire viral protein-coding region were found on the basis of a high probability of viral transmissibility. In the extracted sequence dataset, the rate of total mononucleotide substitutions was ∼26.9 nucleotides (nt) per year (nt/yr) in the 2.5-year period assessed, nearly identical to the realtime estimate reported by Nextstrain (30.6 nt/yr) (42). As observed in the early period, the C-to-U substitution rate remained high throughout the chronological period (10.2 nt/year), whereas its complementary G-to-A substitution rate was ∼3.1 nt/year, which was within the rate range of other mutations ( Figure 1A). These results demonstrated that strand-biased C-to-U mutations in the SARS-CoV-2 genome continued to occur for 2.5 years. This chronological trend of C-to-U substitutions has also been observed in geographically distinct regions throughout the world (Supplementary Figure S1).
Next, focusing on the dinucleotide context, we found that the average numbers of genomic AC-to-AU and UC-to-UU substitutions were 6.8 and 5.2, respectively, higher than those of the other two dinucleotide substitutions ( Figure  1B). Because the SARS-CoV-2 VOC Delta has a high ACto-AU substitution rate, as detected in sequence datasets (34.3%), the overall AC-to-AU substitution rate was higher than that reported previously (7,8). This explanation is substantiated by evidence showing that only the UC-to-UU substitution increased over time, independent of the SARS-CoV-2 lineage (Supplementary Figure S2). In addition, mapping analysis of the mutation signatures in the vRNA secondary structure, which was previously determined by Manfredonia et al. (40) using SHAPE, demonstrated that the UC-to-UU substitutions were highly detectable in nonduplex regions of the vRNA, typically hairpin, bulge or single-stranded regions ( Supplementary Figure S3A and B). These nucleotide preferences suggested that host cytidine deaminases specific for single-stranded polynucleotides are continuously involved in the generation of UC-to-UU mutations in the SARS-CoV-2 genome.
Interestingly, in the viral protein-coding regions, the prevalence of synonymous UC-to-UU mutations per site was significantly higher than that of nonsynonymous UCto-UU mutations (Supplementary Figure S3C), suggesting that the number of SARS-CoV-2 genome mutations driven by host factors and/or viral replication errors might be underestimated in sequence datasets. This underestimation is most likely attributable to certain selective pressures. Importantly, comparative analysis of the substitution frequencies in trinucleotide motifs indicated that the numbers of UUC-to-UUU and ACC-to-ACU mutations were significantly higher than those of mutations in other motifs (Figure 1C). These mutation signatures were consistent with the preferential sequence motifs of APOBEC-mediated deamination and different from RNA polymerization error patterns induced by nsp12s in the related coronaviruses, mouse hepatitis virus (MHV) (43,44) and SARS-CoV-1 (43,45).

Impacts of exogenous cytidine deaminase expression on the viral genome
To identify the APOBEC family member(s) that potentially drive viral C-to-U mutations, we performed an in vitro mutation prevalence analysis of the vRNA genome sequences of viruses produced from 293T cells stably expressing human ACE2 and TMPRSS2 (293AT cells). The cells were transiently transfected with DNA encoding AID, A1, or one of the seven human A3s. The amount of each epitopetagged AID, A3 or A1 in the 293AT cells was confirmed by immunoblot analysis (Figure 2A). Twenty-four hours after infection of the transfected 293AT cells, the culture supernatant was subjected to NGS analysis with the Illumina MiSeq system to identify mutations at each viral genomic position. Mutations found in ≥1% of the population at each position were scored to distinguish minority mutations from sequencing errors (32). Ten positions with UC-to-UU mutation were detected in the vRNA genome of SARS-CoV-2 produced in 293AT cells expressing A3A but not in the genome of SARS-CoV-2 produced in cells expressing any other APOBEC protein ( Figure 2B). Expression of the catalytically inactive A3A mutant (A3A E72Q) induced a minimum number of UC-to-UU mutations (Supplementary Figures S4A and B). Interestingly, these A3A-mediated mutations were also observed in other SARS-CoV-2 lineages, namely, the VOCs Alpha, Beta and Gamma (Supplementary Figure S4). These results demonstrated that exogenous expression of A3A but not of the other APOBEC family members increases UC-to-UU mutations in the SARS-CoV-2 RNA genome in vitro. Furthermore, we tested the effects of A3A on the mutation prevalence in the SARS-CoV-2 genome with different expression levels of A3A or A3A E72Q. The results showed that the UC-to-UU mutation prevalence was increased depending on expression levels of wild-type (WT) A3A, but not on those of A3A E72Q (Supplementary Figure S5). The viral titers were similar between the presence and absence of A3A, suggesting that A3A exerts no strong inhibitory effect on SARS-CoV-2 replication cycle.
We mapped the mutation frequencies per site to the SARS-CoV-2 reference sequence and identified 10 loci (I-X) with a high frequency of UC-to-UU mutations ( Figure  2C). These loci were located at dispersed positions throughout the entire genomic sequence. Importantly, the edited C bases were located on the 3 side in middle positions of bulge or loop regions in the vRNA secondary structure (Figure 2C). Moreover, most substitutions in the VOCs Alpha, Beta and Gamma were detected predominantly at these same genomic positions (Supplementary Figure S4C); how-   ever, other UC-to-UU mutations with low prevalence were mapped to hairpin loop regions that differed among the lineages. These results indicated that A3A-mediated substitutions in the SARS-CoV-2 RNA genome are strand-and sequence motif-specific and occur preferentially at the tips in loop and bulge regions in the vRNA secondary structure. Notably, these features of A3A-mediated substitutions in the SARS-CoV-2 genome appeared similar to those of cellular RNA editing by A3A in macrophages (18) and DNA oligonucleotide deamination by A3A in vitro (46). Moreover, A3A-mediated mutations in the SARS-CoV-2 genome are most likely conserved among Catarrhini (primates), because human, chimpanzee, cynomolgus macaque and rhesus macaque A3A orthologues induced UC-to-UU substitutions at similar loci in the viral genome (Supplementary Figure S6). The mutation prevalence displays higher in the presence of human and chimpanzee A3As than the macaque A3As, likely because rhesus A3A has lower catalytic activity than human A3A (47).

A3A expression in respiratory epithelial cells
Since it is unknown whether A3A is expressed in epithelial cells in the airway and lung tissues, which are primary targets for SARS-CoV-2 infection, we performed quantitative RT-ddPCR to assess the expression of A3A mRNA in cell lines and primary cells of the airway and lung. As shown in Figure 3A, constitutive (control) and IFN-ß-induced expression (+IFN-ß) of A3A mRNA were detected in a cell line-dependent manner, i.e. negligible A3A mRNA expression was observed in 16HBE14o-(a human bronchial epithelial cell line) and MRC-5 (a human foetal lung fibroblast line) cells, whereas high A3A mRNA expression was found in the human lung adenocarcinoma-derived (epithelial-like) cell lines A549 and Calu-3. Moreover, TNF-␣ had an additive effect on IFN-ß-induced A3A mRNA expression in A549 and Calu-3 cells. In contrast, A3A was constitutively expressed and was excessively expressed after induction by IFN-ß alone or IFN-ß + TNF-␣ in all primary cells: primary SAE cells showed increases of 29-and 267-fold; human AT2 cells showed increases of 51-and 177-fold; primary human LBE cells showed increases of 59-and 465fold; and HNEpCs showed increases of 20-and 1,120-fold, by IFN-ß alone and IFN-ß + TNF-␣, respectively. Hypoxia treatment (1% oxygen for 18 h) slightly increased IFN-ßinduced A3A expression in primary cells, LBE cells (1.4fold) and HNEpCs (5.7-fold). Treatment with type II IFN or type III IFNs also induced A3A expression in Calu-3 cells and HNEpCs (Supplementary Figure S7). In contrast, the mRNA expression levels of CD11b and CD68, which are expressed primarily in myeloid lineage cells, such as macrophages, were comparable between Calu-3 cells and primary epithelial cells (SAEs, AT2 cells, LBE cells and HNEpCs) ( Figure 3B). This result suggested that the primary epithelial cells used in this study were not significantly contaminated by macrophages, which are abundant in the airways and lungs, as a potential source of the A3A gene.
The A3A protein level was correlated with the A3A mRNA expression level in HNEpCs ( Supplementary Figure S7). (Notably, we did not compare A3A protein and mRNA expression in Calu-3 cells because the available anti-A3A antibodies failed to generate sufficient titres for detection by immunoblotting). These results demonstrated that IFN-inducible A3A is expressed in airway and lung epithelial cells, where SARS-CoV-2 replicates efficiently. Interestingly, PMA treatment induced a drastic increase in A3A mRNA expression (∼1.8 × 10 6 -fold relative to control) in HNEpCs, similar to the previously observed effect of phorbolin-1 on keratinocytes (11).

Endogenous A3A expression and viral mutation signatures
To validate A3A-mediated vRNA editing in virus-targeted cells, we analysed the C-to-U substitution rates in vRNA of SARS-CoV-2 produced in lung-derived epithelial cells with or without IFN-ß and TNF-␣ stimulation. Because primary cells were less sensitive than Calu-3 cells to SARS-CoV-2 (the B.1.1 lineage strain) in our infection experiments, Calu-3 cells were used for vRNA measurement. Calu-3 cells were infected with virus and were left untreated ( Figure  4A, control) or treated with IFN-ß and TNF-␣ before (Figure 4A, Pretreated) or after ( Figure 4A, Posttreated) infection. SARS-CoV-2 is highly sensitive to IFN pretreatment (37)(38)(39), leading to an insufficient virus yield for adequate genome coverage at the proper sequencing depth for NGS. Moreover, SARS-CoV-2 antagonizes IFN signalling by expressing viral products such as ORF6 (39,48) (as previously reviewed (49)), which may interfere with IFN-stimulated A3A induction. Therefore, we purposely set a time interval between IFN-ß treatment and virus inoculation to allow stable accumulation of the A3A enzyme in cells. In addition, the harvested virus (P1) was serially passaged two times (P3) following the same treatment strategy. Mutations in P1 and P3 samples were analysed by NGS, scored, and mapped to the viral genome. UC-to-UU substitutions (prevalence of ≥1%) were detected only in the pretreated samples (Figure 4A). Additionally, the substitutions detected in P1 differed from those detected in P3, suggesting that mutations are sporadically induced and selected during virus passaging. Because minor mutations other than UC-to-UU substitutions (prevalence of ≥1%) were also detected in the pretreated P1 and P3 samples (for example, A-to-C, G-to-U and A-to-G mutations at nt positions 11 075, 11 219, 14 712 and 16 807, respectively, in P3) ( Figure 4A), treatment with IFN-ß and TNF-␣ might slightly increase the rates of other substitution patterns.
Finally, to clarify whether the observed UC-to-UU substitutions were induced by A3A in Calu-3 cells, we analysed viral genome mutations in SARS-CoV-2 produced in A3A-knockout (KO) Calu-3 cells treated with or without IFN-ß and TNF-␣. A3A-KO Calu-3 cells were generated by the CRISPR/Cas9 system targeting the A3A exon 2 region (Supplementary Figure S8). In two A3A-KO cell clones, #15 and #24, carrying A3A alleles with 37-bp and 76-bp deletions, A3A mRNA expression was abolished in the presence of IFN-ß and TNF-␣ ( Supplementary Figure S8C). Then, A3A-KO Calu-3 (clone #15) cells were infected with SARS-CoV-2 and were left untreated (control) or treated with IFN-ß and TNF-␣ before or after infection, similar to the treatment scheme for Calu-3 WT cells. Mutations in viral genomes in the supernatants were analysed by NGS, and no significant UC-to-UU substitutions (prevalence of ≥ 1%) were observed in the pretreated sample of A3A-KO Calu-3 cells ( Figure 4B). In contrast, two non-Cto-U mutations were detected in the pretreated P1 sample of A3A-KO Calu-3 cells: these mutations may be induced independent of A3A in the presence of IFN-ß and TNF-␣. These data suggested that the IFN-ß-and TNF-␣-induced upregulation of A3A expression was causally connected to the elevation in the UC-to-UU mutation signature of epithelial cells.

DISCUSSION
In this study, we demonstrated that expression of exogenous A3A but not other APOBEC enzymes induced UC-to-UU mutations in the SARS-CoV-2 vRNA genome in vitro ( Figure 2). These mutation signatures mediated by A3A expression, such as the seven major mutation sites (I-III and VII-X) ( Figure 2C), were also observed in multiple lineages in the GISAID data. In addition, we found that upregulation of endogenous A3A expression led to an increase in UC-to-UU mutations in the viral genome in SARS-CoV-2 produced from Calu-3 cells pretreated with IFN-ß and TNF-␣ ( Figure 4). CRISPR/Cas9-mediated A3A KO in Calu-3 cells abrogated the increase in these viral UC-to-UU mutations. This evidence strongly supports the hypothesis that A3A-mediated vRNA editing is a primary mechanism of SARS-CoV-2 mutation by a host factor. However, the mutation positions frequently observed in viral genomes were not completely consistent between the epidemiological datasets and our in vitro experimental data based on exogenous A3A expression (i.e. we found that three major sites, IV-VI, were different between these data sources). In addition, the UC-to-UU mutation rate in the spike (S) region was relatively low according to an epidemiological database but high in our in vitro experimental data ( Figure  2C). These discrepancies are likely because the S mutation signatures found in the datasets result from strong immune selection pressure (3). Importantly, expression of exogenous A3A yielded no significant impact on the viral titer (Supplementary Figure S5D), suggesting that A3A does not contribute to restricting the SARS-CoV-2 replication cycle for a short period. In contrast, the results of our epidemiological analysis showed that the UC-to-UU mutations that are preferentially mediated by A3A chronologically continue to occur in the SRAS-CoV-2 genome (Figure 1). Therefore, A3A appears to play a critical role to increase genetic variations in the SARS-CoV-2 genome for a long period, possibly under neutral viral evolution.
Recently, during the preparation of our manuscript, Kim et al. reported the involvement of vRNA editing mediated by A1, A3A and A3G in the occurrence of biased C-to-U substitutions (50), which differs from our results herein. This discrepancy may be largely due to the different in vitro assay systems used. Kim et al. coexpressed each of seven 200-nt SARS-CoV-2 RNA fragments with A1, A3A, or A3G in 293T cells by DNA transfection and analysed the substitution rates by NGS. In contrast, we infected 293AT cells transiently expressing APOBEC proteins with live SARS-CoV-2 and analysed the mutation rates of the whole viral genome by NGS (Figure 2). In cells, SARS-CoV-2 vRNA is localized predominantly in the cytoplasm where replication occurs; however, RNA fragments expressed by transfection are generally localized in both the nucleus and cytoplasm. Since APOBECmediated deamination events should depend on the specific localization of each enzyme in cells, assessments of RNA editing in SARS-CoV-2 genome fragments expressed by transfection may produce misleading results. For example, A1/APOBEC1-complementation factor (A1CF)mediated deamination occurs predominantly in the nucleus (51), where SARS-CoV-2 vRNA is not present. Additionally, whether A1 and A3G are physiologically involved in vRNA substitutions remains unclear, because the A1 and A3G enzymes are expressed predominantly in intestinal cells (20,52) and lymphoid/myeloid cells (15), respectively, which are not primary targets for SARS-CoV-2 replication and transmission. Therefore, the possibility that both A1 and A3G are involved in driving U-to-C mutations in the SARS-CoV-2 RNA genome under physiological and virological conditions appears unconvincing. Furthermore, although previous reports have shown that A3G deaminates cellular RNAs in macrophages (23), whether A3G catalyses the deamination of RNA substrates remains controversial for two reasons: 1) previous biochemical analyses using in vitro reaction systems demonstrated that A3G exclusively deaminates cytidines in ssDNA and not in RNA substrates (53)(54)(55), and 2) accumulated studies on A3G-mediated antiretroviral activity have shown that A3G incorporated in retroviral particles does not exhibit deamination activity targeting cytosines in retroviral RNA (10). Further investigations on RNA editing events mediated by A3G in cells are required.
Another deaminase, A3B, which induces mutational signatures similar to those induced by A3A in the chromosomal DNA genome (56), exerted no significant effect on the C-to-U substitution rate in the SARS-CoV-2 genome (Figure 2). The in vitro data was consistent with the epidemiological evidence showing that the chronological C-to-U mutation rates were equivalent between sequences of SARS-CoV-2 isolated in different global regions, including Oceania and the Malay Archipelago (Supplementary Figure S1), which exhibit high frequencies of the A3B gene deletion polymorphism (57). Additionally, A3B localizes predominantly to the nucleus (58), which is not the site of SARS-CoV-2 replication.
However, a question has been raised about how A3A efficiently deaminates both RNA and ssDNA but A3G and A3B exclusively deaminate ssDNA despite the phylogenetic similarity of the catalytic domains in the A3G and A3B Nucleic Acids Research, 2023, Vol. 51, No. 2 793 C-terminal domains (CTDs) to A3A as the same Z1 type (10). Structural information might provide important insight to answer this question. Our results ( Figure 2C) and the results of other previous studies have shown that A3A preferentially deaminates cytosines in hairpin sequences of ssDNA (56,59) and ssRNA (18). Based on the molecular structures of the A3A-ssDNA complex (Protein Data Bank ID 5SWW) (59), the ssDNA substrate containing three nucleotides -i.e. the deamination target deoxycytidine (dC 0 ) and the flanking deoxycytidines (dT -1 and dT -2 ) -are docked in the substrate cavity of A3A (60). The trinucleotide recognized by A3A adopts the C2 -endo sugar pucker conformation at dC 0 and dT -1 . In general, ssRNA is predominantly in the C3 -endo conformation, although the riboses at the tips of RNA hairpins tend to adopt the C2endo sugar pucker conformation (61). Therefore, A3A preferentially recognizes ssDNA as well as RNA hairpins (with a nonhelical C2 -endo sugar pucker conformation similar to ssDNA), as supported by our structural modelling analysis (Supplementary Figure S9A). However, the CTDs of both A3B and A3G have bulky loop 1 regions that protrude into the substrate binding pockets and prevent the accommodation of hairpin RNAs (Supplementary Figure  S9). Structural determinations of the A3A-hairpin RNA complex are warranted to indicate the mechanistic hypothesis that among the A3 family enzymes, only A3A efficiently deaminates both ssDNA and RNA.
In SARS-CoV-2-infected cells, vRNAs, including negative-strand vRNA (-vRNA) intermediates, are synthesized in an occluded cellular compartment called a double-membrane vesicle (DMV), which prevents the detection of vRNA intermediates by innate immune sensors (62,63). Since A3A appears to have difficulty targeting -vRNA in DMVs for deamination, the strand-biased UCto-UU substitutions mediated by A3A are the most likely to be detected in the SARS-CoV-2 genome. SARS-CoV-2 replication is largely divided into two phases, post-virus entry through DMV compartmentalization (the early phase) and virus assembly (the late phase), in which A3A effectively targets the vRNA genome in the cytoplasm. As shown in Figure 4, IFN-ß and TNF-␣ treatment before SARS-CoV-2 infection -but not after infection -increased the number of C-to-U substitutions in the vRNA genome in Calu-3 cells. This early-phase dependency might be explained by two mechanisms. (i) SARS-CoV-2 may be more sensitive to A3A-mediated editing in the early phase of virus replication, or (ii) A3A may not be sufficiently induced by IFN-ß because certain viral gene products, such as ORF6, suppress IFN signalling (39,48), which is required for A3A induction. To date, we have not identified A3A inducers (except for the combination of IFN-ß and TNF-␣) that are truly active in COVID-19 patients in vivo. Since PMA induced a drastic increase in the A3A level in primary nasal epithelial cells (Supplementary Figure  S7), PMA-related signalling pathways might be key for clarifying A3A regulation in bronchial tissue epithelial cells.
Overall, our study provides multiple insights into genetic changes in SARS-CoV-2 mediated in association with host factors. To date, cytidine deamination of DNA by A3A has been extensively studied in terms of antiviral (mainly antiretroviral) activity. All these findings suggest new possibilities for understanding the effects of A3A on the genetic diversification of RNA viruses, such as SARS-CoV-2 and other Coronaviridae family members that replicate efficiently in the respiratory epithelium.

DATA AVAILABILITY
The SARS-CoV-2 pangenomic sequence data used in this study are available at the GISAID EpiCoV portal (https: //www.gisaid.org/). The deep sequencing data obtained in this study have been deposited in the DNA Data Bank of Japan (DDBJ) under accession ID PRJDB13356.